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Abstract — We develop novel data dissemination and collection 
algorithms for Wireless Sensor Networks (WSNs) in which we 
consider n sensor nodes distributed randomly in a certain field to 
measure a physical phenomena. Such sensors have limited energy, 
shortage coverage range, bandwidth and memory constraints. 
We desire to disseminate nodes' data throughout the network 
such that a base station will be able to collect the sensed 
data by querying a small number of nodes. We propose two 
data dissemination and collection algorithms (DCA's) to solve 
this problem. Data dissemination is achieved through dynamical 
selection of some nodes. The selected nodes will be changed after 
a time slot t and may be repeated after a period tQ 

I. Introduction 

Wireless Sensor Networks (WSNs) are expanding rapidly 
due to various applications and ease of development. However, 
WSNs encounter several challenges to be deployed efficiently 
in a given environment. Such challenges are limited source en- 
ergy, limited transmission bandwidth, shortage coverage range, 
data dissemination, data persistence, redundancy of defective 
nodes and data security. A typical wireless sensor network 
(WSN) can be used in many applications such as monitoring 
a physical phenomena from the surrounding environment like 
temperature, gases, humidity, volcanoes and tornados. Also, 
it can be used in animal backing, forest fire detection and in 
military applications such as detection of enemy intrusion. 

Many techniques are used in data dissemination ifTTI . ifTSi 
and cluster head election fS), ifTsl . lfT6l . Fountain codes and 
random walks have been used to disseminate data from k 
sources to a set of storage nodes r, see lfT2ll . lfT4l . LEACH 
algorithm ||9l is the most popular clustering algorithm. Several 
cluster head selection algorithms are based on LEACH archi- 
tecture. The main drawback of the mentioned techniques is 
the requirement that positions of all sensors must be known. 
Our algorithms don't use Fountain codes or random walks and 
independent on sensors positions. 

We consider a model for large-scale wireless sensor net- 
works with n identical sensing nodes distributed randomly 
and uniformly in a certain field. The nodes do not know 
the locations of the neighboring nodes as required in (|7] and 
they don't maintain routing tables. In this work, we propose 
two algorithms for data dissemination and data collection in 
wireless sensor networks. The first algorithm is Pre-known 
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Fig. 1 . A model for WSNs with n nodes distributed randomly and uniformly, 
among them are k cluster head nodes with a blue color. 



Head selection data Dissemination and Collection Algorithm 
(PHDCA). The second algorithm is Random Head selection 
data Dissemination and Collection Algorithm (RHDCA). We 
aim to develop an efficient method to randomly distribute and 
collect information from n sensors by querying 10% — 20% of 
nodes for retrieving information about all network nodes with 
a high probability. 

This work is organized as follows. In Section |VT] we 
present a background and short survey of the related work. In 
SectionlUwe introduce the network model. In Sections Hill and 
IIVI we describe the DCA's algorithms and demonstrate some 
analysis for the DCA's algorithms. In Section |V] we present 
simulation studies of the proposed algorithms, and the paper 
is concluded in Section IVIII 

II. Network Model 

In this section we present the network model. Consider a set 
of n identical sensing nodes distributed randomly in a field F 
of dimensions A = Lx W, where L and W are the length and 
the width of F, respectively. We assume that each node has 
at least one neighboring node, meaning that with probability 
P = 1 there are no isolated nodes. 

Definition 1 (Cluster head): The cluster head node (HN) 
is an arbitrary node among all network nodes M which 
exchanges its neighbors data with the other neighboring cluster 
head nodes. 

Definition 2 (Node degree): The node degree dg. is the 
number of neighboring nodes to the node Si within its coverage 



2 



range. The average mean degree of all nodes in M is given 
by 

1 

^^^-y,ds^. (1) 

i=l 

The total period (T) is the period after which the sensed data 
has been disseminated in the network Af and is divided into e 
equal time slots: 

T = ext, (2) 

for some integer number e, The algorithms performance and 
simulation results confirm our theoretic bounds. The head 
nodes consume more energy than other nodes due to excess 
transmissions needed for data dissemination and data collec- 
tion. So, such nodes are dynamically selected to apply fairness 
in energy consumption on all nodes. Also, the dynamical 
selection improves the performance of data dissemination in 
the network. The head nodes will be changed every time slot 
t. The number of head nodes in the network is k (where 
k/n = %10). The selection of T depends on the intended 
application (i.e. T is small for high data rate applications and 
large for low data rate applications). 
Assumptions: 

1) Let S = {si, ,s„} be a set of n identical sensing 

nodes distributed randomly in a field F of dimensions 
A ^ L X W, where L and W are the length and the 
width of F, respectively. 

2) Let H = {hi, , hk} be a set of k head nodes selected 

from the n sensing nodes to disseminate the data in the 
network and they will be changed at each time slot t. 

3) Let T be the period after which the sensed data has been 
disseminated in the network and it is divided into e equal 

slots t ~ {ti, t^}. 

4) The nodes use flooding to know their neighbors, as 
each node will send a message containing its IDs^ to 
all neighboring nodes. Each node receives an incoming 
IDsi from any node Si will consider the node of the 
incoming IDg^ as its neighbor 

5) Each node in the network generates a packet Pg- as 
follows: 

P,^^{ID,^,Xs,Jlag), (3) 

where, IDs^ is the ID of node Si, Xg^ is the sensed data 
of node Si and flag is a variable set to in flooding 
process or to 1 otherwise. 

6) Each node has a radio range coverage r^. The node s; 
will be considered as a neighbor of Sj if and only if 
dsi.sj < rj, where dg.^sj is the distance between nodes 
Si and Sj. 

7) Initially, let the number of nodes n is known. Practically, 
the number of nodes in the network varies due to node 
energy depletion, failure nodes and added redundance 
nodes. Hence, it is important to estimate the number of 
nodes at each period T. The base station will consider 
the number of retrieved nodes when querying 10% of 
nodes as the total number of nodes n. The estimated 



number of nodes will be sent to the first survived head 
node in the network (i.e. The first survived node from 
H). 

III. DCA'S ALGORITHMS 
In this section, we will demonstrate the DCA's algorithms. 

A. PHDCA ALGORITHM 

In PHDCA algorithm we dynamically select the k cluster 
head nodes that disseminate the data in the network according 
to a pre-known manner The algorithm can be classified into 
four phases as follows: 

« Initialization phase: In this phase, the head nodes are 
initially selected from IDg^ = 1 : O.ln at the first time 
slot ti. 

• Flooding phase: In this phase, each sensor will broad- 
cast a message containing its IDs^ to be able to discover 
its neighbors to store them in its data base. If any node 
receives any incoming IDg^ , it will consider the node of 
the incoming IDs^ as its neighbor Also, the broadcasting 
message containing a flag equal zero to indicate the 
flooding phase. 

. Sensing and data dissemination phase: In this phase, 
such sensor reads a new data, it will send this data to 
some of its neighboring nodes. The neighboring head 
nodes will disseminate the data in the network by ex- 
changing their neighbors data among them as shown in 
Fig. [U The head nodes will be changed at each time slot 
and repeated each period T. 

> Data collection phase: In this phase, the base station 
can query small number of any nodes to retrieve the data 
sensed by the n sensing nodes and make an estimation 
for n to send it to the first survived node. 



B. RHDCA ALGORITHM 

In PHDCA algorithm, we assumed that the selection of 
head nodes is pre-known at each time slot t and the head 
nodes are repeated each period T. The disadvantage of this 
algorithm is the topology dependence. So, its performance 
depends on the distribution of head nodes which depends on 
network topology. We extended PHDCA to obtain RHDCA 
that randomly selects k head nodes at each time slot t. 
The performance of RHDCA is topology independent due to 
randomly selection of head nodes. The difference between the 
two algorithms is the sensing and data dissemination phase as 
follows: 

Sensing and data dissemination phase: In this phase k 
head nodes are selected randomly at each time slot t. The 
k head nodes may be not repeated each period T. Also, 
each sensor reads a new data, it will send this data to some 
of its neighboring nodes. The neighboring head nodes will 
disseminate the data in the network by exchanging their 
neighbors data among them. 
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Input: A sensor network with S ~ {si, . . . , s„} source 
nodes, n source packets Xsi, ■ ■ ■ , Xg^ ■ 

Output; storage buffers yi,y2, ■ ■ ■ ,yn for all sensors S. 

t = 1; //initiate the value that represents the number of 

time slot in the period T. 

foreach node u = 1 : n do 
a u < O.ln then 
M is a head node; 

end 

end 

foreach node u = 1 : n do 

Generate a packet containing IDu , flag = and 
broadcast this message to its set of neighbors; 

Pu = [IDu, Xu, flag); 

end 

while still remains surviving nodes do 
foreach node u = 1 : n do 
if u sensed new data then 

u will send this data to some of its neighbors 
randomly; 
end 

end 

foreach head node h= l:k do 

h and its neighboring head nodes exchange their 
neighbors data with each others ; 

end 

if t expired then 

Generate new k head nodes as follows: 
t ++ ; 

foreach node u = 1 : rt do 

if Q.ln{t -I) <u< O.lnt then 
u is a head node; 

end 
end 

if t ==■ e then 

t=0; 

n = rireceived', //updatcs Ti by the received 
estimated node number from base station. 

end 
end 
end 

Algorithm 1: PHDCA algorithm 



IV. ANALYSIS 

In this section we analyze the proposed DCA's algorithms. 
Lemma 3: The probability that a set M of sensors has at 
least one cluster head node is given by 



Pr(A/niJ) = 1 - Y[{1 - 
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(4) 



where, m = \M\ is the number of nodes in M. 

Proof: Number of ways in which the rn nodes can be 
drawn from the total number of nodes n is 

) ^ ^ni ^ 17 \T' '■^^ 

m J m\[m — n)\ 



Number of ways so that no head nodes exist in the set M is 
("m'^) ■ probabihty that the set M has no cluster head 

nodes is "^^y"- Hence, the probability that the set M has at 
least one head node is 
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Lemma 4: The probability that a set M of sensors has a 
set Z of cluster head nodes is given by 

PI ry\ \m — zJ \zJ { £-\ 

r(^) = — — ' (6) 

where, z = |Z| is the number of nodes in Z. 

Proof: Number of ways in which the m nodes can be 
drawn from the n sensing nodes is (J^) . From the Fundamental 
Counting Theorem, the total number of ways in which z head 
nodes and m — z non head nodes can be drawn from the n 



sensing nodes is ("_'^)('^). So, The probability that a set of 



n sensor has z head nodes is 



Definition 5 (Head energy consumption (Eh)): is the en- 
ergy consumption at all nodes due to data dissemination in 
the network N when all nodes have the same coverage range 
and packet size. 

Lemma 6: Let (3 be the probability that a node Sj has a set 
Z of neighboring head nodes. From Equation (|6), (3 can be 
given by 



(7) 



where, Zg- is the number of neighboring head nodes to node 
Si and dg. is the degree of node Si when the set M represents 
the neighboring nodes of the node s^. 
The total energy consumption is given by 



ek 

Eh = — {n^ipr + pt}0a^Zs 

i=l 



(8) 



where, a; is the number of transmissions between the node Si 
and its neighbors and pt, Pr are the transmitted and received 
energy costs due to sending one packet. 

Proof: Let cr be the received energy cost of nodes n due 
to data dissemination, so 



= ^MPr, (9) 

where, is the probability that a node Si is a head node. 
Let C, be the transmitted energy cost of nodes n due to data 
dissemination, so 



c 



n 



n ^-^ 

i=l 



(10) 
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Hence, the total energy consumption due to data dissemination 
at e time slots is given by 



Eh 



ek 
n 



C) 



i=l 



(11) 



Lemma 7: The total energy consumption at the sensing 
nodes due to sending the sensed data to their neighbors is 
given by 

Es = n{pt + UPr), (12) 

where all nodes have the same coverage range and packet size. 

Proof: The energy consumption at nodes n due to sending 
its sensed data is npt- The energy consumption at nodes n due 

n 

to all received packets is J2 Pr ^ . Hence, assuming that 

i=l 

each node updates its data one time at each period T, the 
energy consumption at the n sensing nodes is 



E., 



E 

1=1 



{Pt + dsiPr) 
n{pt + lipr). 



(13) 



Lemma 8: The optimum number of head nodes that gives 
minimum energy consumption is given by 



^opt 



\ 



nEh 



i=l 



(14) 



where, A= 

i=l 

Proof: The optimum number of head nodes that gives 
minimum energy consumption can be driven by the differen- 
tiation of Equation ( [8]i as follows: 

[Eh) = 0. (15) 



dfc, 



opt 



V. Simulation and Performance Evaluations 

In this section we present some simulation results to illus- 
trate the performance of the proposed algorithms. 

Definition 9: Decoding Ratio (77) is the ratio between the 
number of queried nodes h and the total number of sources 
n. 

n. 

(16) 



n 

77 = - 



Definition 10: Successful Decoding Probability (P^) is the 
probability that the n source packets are all recovered from 
the n queried nodes. 

Fig. 12] shows the relation between the successful decoding 
probability and the decoding ratio for different values of 
sensing nodes n in PHDCA and RHDCA algorithms. 

Fig. |2(a)| and Fig. |2(b)| show that increasing the number 
of network nodes n and fixing the covering radius r of all 



Successful Decoging Probability Vs. Decoding Ratio in PHDDCA 




Decoding Ratio 



(a) PHDCA 



Successful Decoging Probability Vs. Decoding Ratio in RHDDCA 




Decoding Ratio 



(b) RHDCA 



Fig. 2. The relation between the successful decoding probability and the 
decoding ratio for n=100, n=200, n=300, n=500, n=1000 when A=100*100, 
e=10, buffer size=40 and r=5. 



PHDDCA 




(a) PHDCA 



ig node in RHDDCA 




300 350 



(b) RHDCA 

Fig. 3. The energy consumption at each sensing node in network A'^ 
when A=100*100, n=300, e=10, buffer size=40, packet size=2Kbits, node 
energy=5J and r=5. 
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nodes will result in an improvement in the successful decoding 
probability as well. We can notice that as the number of nodes 
increasing, the number of queried sensors can be decreased 
to recover the data with a reasonable successful probability. 
Particularly, for n > 500, we see that querying up to 10% will 
reveal about 85% of network data in PHDCA and about 92% 
of network data in RHDCA. 

Fig. [3] shows the amount of energy consumption at each 
node after the dissemination of data in the network Af in 
PHDCA and RHDCA algorithms. From this figure we can 
notice that the energy consumption in PHDCA algorithm is 
better than the obtained result in RHDCA algorithm. We 
assumed that the energy consumption at the sensing node due 
to sensing data is neglected and each sensor node is assumed 
to be of initial battery charge 5 Joule. We calculated the energy 
consumption according to ifTTl , they assumed that the energy 
consumption at a sensor node Si due to transmitting one packet 
is given by 

pt = (50 * 10"^ + 100 * 10^12 ^ ) * ■ (17) 

and the energy consumption at a sensor node Si due to 
receiving one packet is given by 

=50* 10"^* Vs., (18) 

where ipsi is the packet size of node s^. 

Definition 11: Death Rate (DR) is the ratio between the 
number of dead nodes n and the total number of sensing nodes 
n. 

DR=-. (19) 

n 

Fig. |4] illustrates the relation between the death rate and 
the total number of sensing nodes. Increasing the number of 
network nodes n and fixing the field area will result in an 
increase in the death rate as well. 

Fig. |5] shows the relation between the performance of data 
collection and the elapsed time in DCA's. As the elapsed 
time increases, more nodes disappear from the network N 
(i.e. DR increases) due to energy depletion. Hence, the 
data dissemination and data collection performances will be 
negatively affected by the disappeared nodes n. From Fig. |5(a)| 
and |5(b)| we can deduce that the performance of data collection 
of PHDCA along time is better than RHDCA. 

VI. Discussions 

In this section we will indicate the related work to our work. 

• The authors in IT] proposed a distributed data collection 
algorithm to store and forward information obtained by 
wireless sensor networks. They used n — k storage nodes 
to collect the sensed data from the network, where k is 
the sensor nodes, n is the total number of nodes and 
{n-k)/n is 20%. 

• The authors in ||2l, ifTSl . |I3] suggested two distributed 
storage algorithms for large-scale wireless sensor net- 
works. Such node chooses randomly one of its neighbors 
to send its data to another neighbor Such packet can 
travel to a certain number of hops according to its time to 



Death rate Vs. Life time in PHDDCA 
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(b) RHDCA 

Fig. 4. The relation between the death rate and number of nodes n in network 
N when A=100*100, e=10, buffer size=40 and r=5. 



Tfie performance of PHDDCA arong time 
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(a) PHDCA 



Tine performance of RHDDCA along time 
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(b) RHDCA 

Fig. 5. The performance of PHDCA and RHDCA algorithms along time 
when A = 100 X 100, n = 300, €=10, buffersize = 40 and r=5. 
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live counter. The receiver node will decide with a random 
probability if it will accept the incoming message or not. 
> The authors in |6] used a decentralized implementation 
of Fountain codes that uses geographic routing and every 
node has to know its location. 

• The authors in [[Sj increased data persistence in wireless 
sensor networks using a novel technique called growth 
codes, i.e. increasing the amount of information that can 
be recovered at the sink nodes. 

• Authors in |1T0| used a centralized controller to select 
CHs. By using a central control algorithm to form the 
clusters, it can produce better clusters by dispersing the 
CH nodes throughout the network. 

• The authors in proposed a distributed, randomized 
clustering algorithm to organize the sensors in a wireless 
sensor network into clusters. 

VII. Conclusion and future work 

In this paper, we presented two algorithms for data dissem- 
ination and collection in wireless sensor networks. Given n 
sensing nodes with limited buffers, we demonstrated schemes 
to disseminate sensed data throughout the network with less 
computational overhead. The proposed algorithms did not 
assume any pro-known of routing tables or nodes' locations. 
In addition, the time factor T can be selected to be suitable for 
the intended applications and minimizing energy consumption. 
Our future work will develop accurate practical algorithms to 
optimize energy consumptions in the sensor network. 
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